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market basket data has recently been an important research topic in the area of knowledge discovery from 
databases. It was originally introduced in [2] and studied extensively in [1, 5, 25, 26, 31, 19, 23, 29, 30, 3, 4, 
33, 14]. The problem is typically examined in the context of discovering buying patterns from retail sales 
transactions. Although there are many similar data mining applications which can be modelled in this way, we 
again study the problem using the retail store example because of its intuitive nature .... 

....of share for itemsets, and redefine the notions of frequent itemsets and confidence. We refer to this extended 
formalism as the shareconfidence framework for association rules and refer to the new itemset measures as 
simply share measures. In this framework, any of the algorithms presented in [2, 3, 16, 19, 22, 23, 29, 30, 31, 
32, 33] can used to generate frequent itemsets using our new definition for frequent itemset. The 
definitions in this section have been implemented in a data mining system for analyzing market basket data. This 
system is an extension of DB Discover, a software tool for knowledge discovery form databases .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. In Proceedings of the 1 1th International 
Conference on Data Engineering (ICDE'95), pages 25-34, 1995. 



An Algorithm for Mining Association Rules Using Perfect. - Ozel, Guvenir (Correct ) 

....is greater than or equal to the minimum confidence. However the first step of association rule mining, finding 
the frequent itemsets, is very resource consuming task and for that reason, it has been one of the most popular 
research field in data mining. Several algorithms, AIS [3] SETM [8], Apriori [4] Direct Hashing and Pruning 
[5, 9] Partition [10] Sampling [11] and some other parallel algorithms [12] have been developed. In this 
study, a fast algorithm based on Direct Hashing and Pruning (DHP) algorithm is proposed. The DHP algorithm is 
described in Section II, our algorithm .... 

M. Houtsma and A. Swami, "Set-Oriented Mining of Association Rules", Research Report RJ 9567, IBM Almaden 
Research Center, San Jose, California, (Oct. 1993). 



Intension Mining: A New Paradigm in Knowledge Discovery - Gupta, Bhatnagar, Wasan.. (2000) (Correct) 

....cation with no change in semantics. Data Mining algorithms operate on previously selected, cleaned and 
transformed data. The choice of the mining algorithm depends on the type of knowledge to be discovered. Intra 
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record links, termed Association Rules can be discovered using algorithms given in [4, 5, 6, 29, 31, 35, 41] 
etc. Database segmentation can be performed using various clustering techniques [18, 28, 37, 38, 51] etc. Classi 
cation can be performed by inducing either a decision tree or decision rule [1, 3, 8, 21, 23] or by neural network 
techniques [32] Presentation of the Discovered Knowledge is the .... 

M. Houstma and A. Swami. Set oriented Mining of Association Rules. In Proceedings of the Int'l Conf. on Data 
Engineering, pages 25-33, 1995. 64 



Discovering Interesting Association Rules in Medical Data - Ordonez, Santana, de Braal (2000) (Correct) 

....are used with basket data. Medical data sets are more complex and thus present many new challenges. This 
paper incorporates some ideas from our previous work to mine rules on segmented images [16] Most 
papers published in the database literature concentrate on optimizing the first phase [18, 7, 12, 13, 14, 19, 
17] but a few look at the problem of also improving rule generation (2nd phase) 6, 18, 15] For instance, 14] 
proposes an algorithm to summarize associations when they are too many. 7] attacks the problem of 
inserting transactions on an already mined set and proposes an algorithm that .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. Technical Report RJ 9567, IBM, October 
1993. 



Fast Algorithms for Mining Association Rules - Aqrawal. Srikant (1994) (848 citations) (Correct) 

....example, D could be a data file, a relational table, or the result of a relational expression. An algorithm for 
finding all association rules, henceforth referred to as the AIS algorithm, was presented in [AIS93b] 
Another algorithm for this task, called the SETM algorithm, has been proposed in [HS93] In this paper, we 
presenttwo new algorithms, Apriori and AprioriTid, that differ fundamentally from these algorithms. We 

present experimental results, using both synthetic and real life data, showing that the proposed algorithms always 
outperform the earlier algorithms. The performance gap is .... 

....person who orders a comforter also orders a flat sheet, a fitted sheet, a pillow case, and a ruffle. The 
algorithms in Section 3 generate suchmulti consequent rules. In Section 4, we show the relative performance 
of the proposed Apriori and AprioriTid algorithms against the AIS [AIS93b] and SETM [HS93] algorithms. 

To make the paper self contained, we include an overview of the AIS and SETM algorithms in this section. We 
also describe how the Apriori and AprioriTid algorithms can be combined into a hybrid algorithm, Apriori Hybrid, 
and demonstrate the scale up properties of this algorithm. We .... 

[Article contains additional citation context not shown here] 

Maurice Houtsma and Arun Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM 
Almaden ResearchCenter, San Jose, California, October 1993. 30 



SQL Based Association Rule Mining using Commercial.. - Yoshizawa.. (2000) (Correct) 

....and credit card fraud indications are widely recognized. One method of data mining is finding association rule 
[1] Basket data analysis is typical of this method. There are some approaches proposed to mine association 
rules, 1,2,6,9] some of them are based on relational database standard SQL [3,7,8]. But this kind of mining is 
known as CPU power demanding application and it has to handle very large amounts of transaction data. 
Unfortunately SQL approach is reported to have drawback in performance although it has many advantages 
such as seamless integration with existing system and high .... 

.. .required by association rule mining. This fact motivated us to examine how efficiently SQL based 
association rule mining can be parallelized and speeded up using commercial parallel database system 
(IBM DB2 UDB EEE) We propose two techniques to enhance association rule mining query based on 
SETM [3]. And we have also compared the performance with commercial mining tool (IBM Intelligent Miner) Our 
performance evaluation shows that we can achieve comparable performance with commercial mining tool using 
only 4 nodes. Some considerable works on effective SQL queries to mine association rule .... 

[Article contains additional citation context not shown here] 

M. Houtsma, A. Swami. Set-oriented Mining of Association Rules. In Proc. of International Conference on Data 
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Engineering (ICDE), 1995. 



Extended Concepts for Association Rule Discovery - Rantzau (1997) (Correct) 

....at most twice. In the first scan it generates all candidates and in the second their support is computed. Apriori 
outperforms Partition only when the minimum support threshold is set high. The Partition algorithm lends itself to 
an implementation on parallel computers. The SETM algorithm [HS95, AS94a] uses SQL to generate frequent 
itemsets. Like AIS, candidates are generated while transactions are read from the database. However, SETM 
separates candidate generation from counting. It has a worse performance than AIS for both synthetic and real 
life datasets. The algorithm presented in .... 

Maurice Houtsma and Arun Swami. Set-oriented Mining of Association Rules. In Proceedings of the 1 1th 
International Conference on Data Engineering, Taipei, Taiwan, pages 25-33, March 1995. 



Performance Evaluation and Optimization of Join Queries.. - Thomas. Chakravarthy (1998) (2 citations) 
(Correct) 

....objectrelational extensions to execute mining operations. This entails transforming the mining operations into 
database queries and in some cases developing newer techniques that are more appropriate in the database 
context. The UDF based (user de ned function) approach in [2] the SETM algorithm [5], the formulation of 
association rule mining as query ocks [10] and SQL queries for mining [9] all belong to this category. Two 

categories of SQL implementations for association rule mining one based purely on SQL 92 and the other using 
the object relational extensions to SQL (SQL OR) are .... 

... = q.tid and q.tid = r.tid We can also use the Subquery approach to generate T 3 if that is less expensive. T 3 
will contain exactly the same tuples produced by subquery Q 3 . The Set oriented Apriori algorithm bears 
some resemblance with the three way join approach in [9] the SETM algorithm in [5] and the AprioriTid 
algorithm in [3] In the three way join approach, the temporary table T k stores for each transaction, the 
identi ers of the candidates it supported. T k is generated by joining two copies of T k 1 with C k . The 
generation of F k requires a further join of T k with C k . The .... 

[Article contains additional citation context not shown here] 

M. Houtsma and A. Swami. Set-oriented mining of association rules. In Int'l Conference on Data Engineering, 
Taipei, Taiwan, March 1995. 



Algorithms For Computing Association Rules Using A.. - Graham Goulbourne Frans (2000) (1 citation) (Correct) 

....support for all members of C k , and from this, produces the set L k of interesting sets of size k. This is then 
used to derive the candidate sets C k 1 , using the downward closure property, that all the subsets of any member 
of C k 1 must be members of L k . Other algorithms, AIS [1] and SETM [3], have the same general form but 
differ in the way the candidate sets are derived. Two aspects of the performance of these algorithms are of 
concern: the number of passes of the database that are required, which will in general be one greater than the 
number of attributes in the largest .... 

Houtsma, M. and Swami, A. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden 
Research Centre, San Jose, October 1993. 



Performance Evaluation and Optimization of Join Queries. . - Thomas, Chakravarthy (1998) (2 citations) 
(Correct) 

....extensions to execute mining operations. This entails transforming the mining operations into database queries 
and in some cases developing newer techniques that are more appropriate in the database context. The UDF 
based (user de ned function) approach in [AS96] the SETM algorithm [HS95] the formulation of 
association rule mining as query ocks [TUA 98] and SQL queries for mining [STA98] all aim at tighter 
database integration. STA98] presents a detailed study of the various architectural alternatives for mining data 
stored in a DBMS. It has been reported that for .... 

....Section 3.3. Figure 11: Comparison of Subquery and Set oriented Apriori approaches In Figure 11, we show 
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the relative performance of Subquery and Set oriented Apriori approaches for the two datasets. The chart shows 
the total time taken for each of the di erent passes. We ran the SETM algorithm [HS95] also for a few support 
values and found that it is an order of magnitude slower. Set oriented Apriori performs better than Subquery 
for all the support values. The rst two passes of both the approaches are similar and they take approximately 
equal amount of time. The di erence between .... 

Maurice Houtsma and Arun Swami. Set-oriented mining of association rules. In Int'l Conference on Data 
Engineering, Taipei, Taiwan, March 1995. 



Incremental Mining of Constrained Associations - Thomas. Chakravarthy (1998) (6 citations) (Correct) 

....frequent itemsets) generated at each level is based on the observation that if an itemset S appears in c 
baskets, then any subset of S appears in at least c baskets. The need for applying association rule mining to 
data stored in databases data warehouses has motivated researchers to [SK97, HS95, AS96, STA98, 
TC98, TS98] i) study alternative architectures for mining over data stored in databases, ii) translate 
association rule mining algorithms to work with relational and object relational databases, ii) optimize the 
mining algorithms beyond what the current relational query optimizers .... 

Maurice Houtsma and Arun Swami. Set-oriented mining of association rules. In Int'l Conference on Data 
Engineering, Taipei, Taiwan, March 1995. 

Parallel Mining of Association Rules - Aqrawal. Shafer (1996) (52 citations) (Correct) 

....data mining is that it will deliver technology that will enable development of a new breed of decision support 
applications. Discovering association rules is an important data mining problem [1] Recently, there has 
been considerable research in designing fast algorithms for this task [1] 3] 5] [6] [8] 12] 9] 11] However, 
with the exception of [10] the work so far has been concentrated on designing serial algorithms. Since the 
databases to be mined are often very large (measured in gigabytes and even terabytes) parallel algorithms are 
required. We present in this paper three parallel .... 

....upon the patterns the different transactions support. This algorithm also incorporates load balancing. These 
algorithms are based upon the serial algorithm Apriori which was first presented in [3] We chose the 
Apriori algorithm because of its superior performance over the earlier algorithms [1] [6], as shown in [3] 
We preferred Apriori over AprioriHybrid, a somewhat faster algorithm in [3] because AprioriHy brid is 
harder to parallelize; the performance of AprioriHybrid is sensitive to heuristically determined 
parameters. Furthermore, the performance of Apriori can be made to approximate .... 

[Article contains additional citation context not shown here] 

Maurice Houtsma and Arun Swami. Set-oriented mining of association rules. In Int'l Conference on Data 
Engineering, Taipei, Taiwan, March 1995. 



Parallel SQL Based Association Rule Mining on.. - Pramudiono.. (1999) (Correct) 

....called large itemsets. 2. Generate the desired rules using large itemsets. Since the first step consumes most 
of processing time, development of mining algorithm has been concentrated on this step. In our experiment we 
employed ordinary standard SQL query that is similar to SETM algorithm [3J.lt is shown in figure 1. 
CREATE TABLE SALES (id int, item int) PASS 1 CREATE TABLE C 1 (item 1 int, cnt int) CREATE TABLE R 1 
(id int, item 1 int) INSERT INTO C 1 SELECT item AS item 1, COUNT( FROM SALES GROUP BY item HAVING 
COUNT( min support; INSERT INTO R 1 SELECT .... 

M. Houtsma, A. Swami. Set- oriented Mining of Association Rules. In Proc. of International Conference on Data 
Engineering, 1995. 

Pincer-Search: An Efficient Algorithm for Discovering the.. - Lin, Kedem (1999) (Correct) 

....Frequent Set Discovery We briefly discuss existing frequent set discovery algorithms in a roughly chronological 
order. AIS and SETM Algorithms The problem of association rule mining was first introduced in [2] An algorithm 
called AIS was given for discovering the frequent set. SETM algorithm [13] was later designed to use only 
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standard SQL commands to find the frequent set. The Apriori algorithm [3] described above, performs much 
better than AIS and SETM. The OCD Algorithm It is worth adding, that concurrently with the Apriori algorithm, 
OCD algorithm [19] used the same closure property .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden 
Research Center, Oct. 1993. 



Computing Association Rules Using Partial Totals - Graham Goulbourne Frans (2001) (2 citations) (Correct) 

....support for all members of C k , and from this, produces the set L k of interesting sets of size k. This is then 
used to derive the candidate sets C k 1 , using the downward closure property, that all the subsets of any member 
of C k 1 must be members of L k . Other algorithms, AIS [1] and SETM [3], have the same general form but 
differ in the way the candidate sets are derived. Two aspects of the performance of these algorithms are of 
concern: the number of passes of the database that are required, which will in general be one greater than the 
number of attributes in the largest .... 

Houtsma, M. and Swami, A. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden 
Research Centre, San Jose, October 1993. 10 



Fast Algorithms for Discovering the Maximum Frequent Set - Lin (1998) (Correct) 

....Itemsets f1,2,3,5g and f1,2,5g were not considered, since the item 5 was not in the transaction. Two 
complicated heuristics, remaining tuples optimization and pruning function optimization, were used to prune 
candidates. Unfortunately, this algorithm still generates too many candidates. SETM [HS93] algorithm was 
later designed to use only standard SQL commands to find the frequent set. However, like AIS, SETM also 
creates candidates on the fly while reading the database. Both algorithms are not efficient, since they generate 
and count too many unnecessary candidates. 2.4.2 Apriori and OCD .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden 
Research Center, Oct. 1993. 



Integrating Data Mining with Relational DBMS: A.. - Nestorov, Tsur (1999) (1 citation) (Correct) 

... that requires consideration: can we achieve a comparable, or at least an acceptable level of 
performance from these integrated methods when compared to the special purpose external methods 
This question was previously examined in a more narrow context of association rules and a particular 
DBMS in [7] and [2] Section 2 of this paper elaborates on the general architectural choices available and 
their comparison. The idea of flocks [11] was presented as a framework for performing complex data analysis 
tasks on relational database systems. The method consists of a generator of candidate query .... 

H. Houtsma and A. Swami. Set-oriented mining of association rules. In Proceedings of International Conference 
on Data Engineering, pages 25-33, Taipei, Taiwan, March 1995. 



Efficient Mining for Association Rules with Relational.. - Rajamani. Cox, Iyer, al. (1999) (3 citations) (Correct ) 

....With the (Transaction id, Item) schema the Transaction id value would be repeated for every item bought in that 
transaction. The SC data model would be useful for performing conventional relational queries against items 
bought in transactions. Some of the early work in association rule mining [11] propose the use of such 
relational queries for discovering association rules, and work with this data model. However, later work [3] 
have shown significant performance improvement by using Apriori based algorithms that did not use relational 
queries in their implementation. But, to the best of .... 

....proposed for interoperability in multi database systems and not for providing the flexibility and functionality 
required by data mining applications. Agrawal and Shim [2] show the benefit of using UDFs for the development 
of applications tightly coupled with the database engine. Houtsma and Swami [11] proposed SETM, an SQL 
based algorithm for association rule mining. Their algorithm uses simple database operations sorting and 
mergescan joins. However, their joins are more expensive as they are against the input data table and they do 
not have an efficient candidate set pruning such as .... 
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M. Houtsma and A. Swami. Set-oriented Mining of Association Rules. Technical Report RJ 9567, IBM Almaden 
Research Center, October 1993. 



Mining Association Rules - Cengiz (Correct) 

....with predefined template patterns. If a found rule matches to any pattern than it is said to be interesting, 
otherwise uninteresting. 4 Serial Algorithms Since [AIS93] the introduction of the problem, there has been 
work on designing algorithms for mining Boolean association rules [AS94] [HS95] [MTV94] SON95] 
PCY95] The algorithm presented in [AIS93] incorporates buffer management, novel estimation, and 
pruning techniques to provide efficiency. This algorithm finds only Boolean association rules. This work was 
later extended to cover quantitative rules [SA95] HF95] Those works try .... 

Maurice Houtsma and Arun Swami. Set-oriented mining of association rules. In Int'l Conference on Data 
Engineering, Taipei, Taiwan, March 1995. 



Mining for Complex Patterns in Large Databases Using.. - Jermaine. Miller (1998) ( Correct) 

....the model representations upon which they rely. The limitations include restrictions on the type and 
characteristics of the data, limiting the accuracy and completeness of the models. 3. 1 Association Rules 
Association rules have been the most prevalent mining method in the database literature [1, 2, 3, 4, 9, 11, 
12, 14, 15, 20, 22, 23]. While very useful in describing certain large scale trends or relationships among the data 
in very large datasets, association rules are unable to describe certain relationships that have very strong 
statistical significance but occur in a only a relatively small subset of the data precisely .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. In Proceedings of the International 
Conference on Data Engineering, pages 25-34, 1995. 



Parallel Data Mining for Association Rules on.. - Zaki. Ogihara.. (1996) (30 citations) (Correct) 

....large itemsets, and in the second pass the support for all these is measured. The above algorithms are all 
specialized black box techniques which do not use any database operations. Algorithms using only general 
purpose DBMS systems and relational algebra operations have also been proposed [7, 8]. The work 
closest to this from the machine learning literature is the KID3 algorithm presented in [12] The main problem with 
their approach is that it may take exponential time in the worst case as opposed to the polynomial time algorithms 
presented in the above papers. There has been very .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. In RJ 9567. IBM Almaden, Oct. 1993. 



Formal Logics of Discovery and Hypothesis Formation By Machine - htejek. Holena (Correct) 

....tables [5, 9, 31, 32, 55, 56] Scope. GUHA relates, in particular, to mining association rules. Indeed, if A = fA 
1 ; Am g is the set of binary attributes in a database of size k, and if X; Y ae A; X Y = then the association 
rule X ) Y is significant in the database (according to [1, 2, 27, 28, 40, 47, 54]) if and only if the GUHA 
sentence i2X A i =B;p i2Y A i holds for the k Theta m dichotomous data matrix formed by the values of the 
attributes from A. Here, B;p is a founded version of the generalized quantifier = p mentioned in section 2 
(version requiring the frequence a to be at .... 

Houtsma, M., and Swami, A. Set-oriented mining of association rules. Tech. rep., IBM Almaden Research Center, 
1993. 



Beyond Market Baskets: Generalizing Association Rules.. - Silverstein, Brin.. (1997) (40 citations) (Correct) 

... properties, text mining 1. Introduction One particularly well studied problem in data mining is the search 
for association rules in market basket data (Agrawal et al. 1993a, Agrawal et al. 1993b, Klemettinen et al. 
1994, Mannila et al. 1994, Agrawal and Srikant, 1994, Han and Fu, 1995, Houtsma and Swami, 1995, Park et 
al. 1995, Srikant and Agrawal, 1995, Savasere et al. 1995, Agrawal et al. 1996, Toivonen, 1996) In this 
setting, the base information consists of register transactions of retail stores. The goal is to discover buying 
patterns such as two or more items that are bought together often 
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Houtsma, M. and Swami, A. 1995. Set-oriented mining of association rules. Proceedings of the International 
Conference on Data Engineering, pp. 25-34. 



Discovering all Most Specific Sentences by Randomized.. - Gunopulos. al. (1997) (31 citations) (Correct) 

....these candidates , the algorithm calls the function q to check whether really belongs to Th. This iterative 
procedure is performed until no more sentences in Th are found. This level wise algorithm has been used in 
various forms in finding association rules, episodes, sequential rules, etc. [2, 3, 24, 23, 1, 13, 14, 25]. The 
drawback with this algorithm is that it always computes the whole set Th(L; r; q) even in the cases where a 
condensed representation of Th using most specific sentences would be useful. Given Th, a sentence 2 Th is a 
most specific sentence of Th, if for no 2 Th we have OE . Denote .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden 
Research Center, San Jose, California, October 1993. 



Query Flocks: A Generalization of Association-Rule Mining -Tsur. Ullman.. (1998) (43 citations) (Correct) 

....answer to a question like find all the pairs of items that appear in at least c market baskets. 1.3 The Problem 
With SQL as a Mining Language In principle, we can express a query about pairs of items that appear in a large 
number of baskets in conventional SQL. This approach was examined by [HS95], for instance. The problem is 
that the right optimizations are beyond the state of the art in commercial database systems. For example, Fig. 1 
shows how to express the query find all pairs of items that appear together in at least 20 market baskets. 1 There, 
we join baskets with itself, .... 

M. Houtsma and A. Swami, " Set-oriented mining of association rules" Proc. Intl. Conf. on Data Engineering, pp. 
25-34. 



Query Flocks: A Generalization of Association-Rule Mining - Dick Tsur (1998) (43 citations) (Correct) 

....answer to a question like find all the pairs of items that appear in at least c market baskets. 1 .3 The Problem 
With SQL as a Mining Language In principle, we can express a query about pairs of items that appear in a large 
number of baskets in conventional SQL. This approach was examined by [HS95], for instance. The problem is 
that the right optimizations are beyond the state of the art in commercial database systems. For example, Fig. 1 
shows how to express the query find all pairs of items that appear together in at least 20 market baskets. 1 There, 
we join baskets with itself, .... 

M. Houtsma and A. Swami, "Set-oriented mining of association rules" Proc. Intl. Conf. on Data Engineering, pp. 
25-34. 



Mining for Strong Negative Associations in a Large Database.. - Ashok Savasere (1998) (6 citations) (Correct) 

.... of the promising areas of research encompassing databases, statistics and machine learning [6, 12, 
15] Recently, there has been considerable interest in finding associations between items in a database of 
customer transactions, such as the sales data collected at super market check out counters [1, 2, 5, 7, 11, 

8, 14, 4]. Association rules identify items that are most often bought along with certain other items by a significant 
fraction of the customers. For example, we may find that 95 of the customers who bought bread also bought milk. 
Every rule must satisfy two user specified constraints: one is a measure .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. In Proceedings of the International 
Conference on Data Engineering, Taipei, Taiwan, March 1995. 



Mining Quantitative Association Rules in Large Relational Tables - Srikant, Agrawal (1996) (201 citations) 
(Correct) 

....supports are counted are new. We present our experience with this solution on a real life dataset in Section 6. 
1. 3 Related Work Since the introduction of the (Boolean) Association Rules problem in [AIS93] there has 
been considerable work on designing algorithms for mining such rules [AS94] [HS95] [MTV94] SON95] 
PCY95] This work was subsequently extended to finding association rules when there is a taxonomy on 
the items in [SA95] HF95] Related work also includes [PS91] where quantitative rules of the from x = q x ) 
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y = q y are discovered. However, the antecedent and consequent are .... 

Maurice Houtsma and Arun Swami. Set-oriented mining of association rules. In Int'l Conference on Data 
Engineering, Taipei, Taiwan, March 1995. 



Fast Sequential and Parallel Algorithms for Association Rule.. - Mueller (1995) (44 citations) (Correct) 

....paper, rules were restricted to one item in the consequent but allowed any union of items in the antecedent. 
This limitation was not justified by the algorithm itself because it finds all frequent sets, and this information is 
enough to produce rules without this limitation. 2.3. 2 SETM SETM [22] is designed to use only standard 
database operations to find frequent sets. For this reason, it uses its own data representation that stores every 
itemset supported by a transaction along with the transaction s TID. Figure 2.3 shows part of an example run of 
SETM on a tiny database and .... 

Houtsma and Arun Swami. Set-oriented mining of association rules. Technical Report RJ 9567, IBM Research 
Report, Oct. 1993. 



An Effective Hash-Based Algorithm for Mining Association Rules - Park, Yu (1995) (114 citations) (Correct) 

....are referred to as large itemsets. Once all large itemsets are obtained, the desired association rules can be 
generated in a straightforward manner. Subsequent work in the literature followed this approach and focused on 
the large itemset generations. Various algorithms have been proposed [3, 5, 8] to discover the large 
itemsets. Generally speaking, these algorithms first construct a candidate set of large itemsets based on some 
heuristics, and then discover the subset that indeed contains large itemsets. This process can be done iteratively 
in the sense that the large itemsets discovered .... 

M. Houtsma and A. Swami. Set-Oriented Mining of Association Rules. Technical Report RJ 9567, IBM Almaden 
Research Laboratory, San Jose, CA, October 1993. 



Integrating Association Rule Mining with Relational Database.. - Sarawagi (1998) (46 citations) (Correct) 

....proposal makes use of user defined functions (UDFs) in SQL statements to selectively push parts of the 
computation into the database system. The objective was to avoid one at a time record retrieval from the 
database, saving both the copying and process context switching costs. The SETM algorithm [10] for finding 
association rules was expressed in the form of SQL queries. However, as shown in [3] SETM is not efficient 
and there are no results reported on running it against a relational DBMS. Recently, the problem of expressing 
the association rules algorithm in SQL has been explored in [20] .... 
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present experimental results, using both synthetic and real life data, showing that the proposed algorithms always 
outperform the earlier algorithms. The performance gap is .... 

....person who orders a comforter also orders a flat sheet, a fitted sheet, a pillow case, and a ruffle. The 
algorithms in Section 3 generate such multi consequent rules. In Section 4, we show the relative performance 
of the proposed Apriori and AprioriTid algorithms against the AIS [AIS93b] and SETM [HS93] algorithms. 

To make the paper self contained, we include an overview of the AIS and SETM algorithms in this section. We 
also describe how the Apriori and AprioriTid algorithms can be combined into a hybrid algorithm, AprioriHybrid, 
and demonstrate the scale up properties of this algorithm. We .... 
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this requires reading the database completely for each pass resulting in a large number of disk reads. In these 
algorithms, the effort spent in performing .... 

....is stopped when in some iteration n, no large itemsets are generated. The algorithm, in this case, makes n 
database scans. 2.1 Previous Work The problem of generating association rules was first introduced in [1] and an 
algorithm called AIS was proposed for mining all association rules. In [8], an algorithm called SETM was 
proposed to solve this problem using relational operations in a relational database environment. In [2] two 
new algorithms called Apriori and AprioriTid were proposed. These algorithms achieved significant improvements 
over the previous algorithms and were .... 
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....D. For example, D could be a data file, a relational table, or the result of a relational expression. An algorithm 
for finding all association rules, henceforth referred to as the AIS algorithm, was presented in [4] Another 
algorithm for this task, called the SETM algorithm, has been proposed in [13]. In this paper, we present two 
new algorithms, Apriori and AprioriTid, that differ fundamentally from these algorithms. We present experimental 
results showing that the proposed algorithms always outperform the earlier algorithms. The performance gap is 
shown to increase with problem size, and .... 

....subsets of I to generate rules with multiple consequents. Due to lack of space, we do not discuss this 
subproblem further, but refer the reader to [5] for a fast algorithm. In Section 3, we show the relative 
performance of the proposed Apriori and AprioriTid algorithms against the AIS [4] and SETM [13] 
algorithms. To make the paper self contained, we include an overview of the AIS and SETM algorithms in this 
section. We also describe how the Apriori and AprioriTid algorithms can be combined into a hybrid algorithm, 
AprioriHybrid, and demonstrate the scaleup properties of this algorithm. We .... 
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Several algorithms for nding frequent sets have been presented [1, 2, 11, 14, 15, 16, 31, 35, 36, 37, 38]. The 
problem of nding all frequent sets can be described in our framework as follows. The description language L 
consists of all subsets X of elements of R. The selection predicate q(r; X) is true if and only if fr(X) minfr, where 
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....large itemsets, and in the second pass the support for all these is measured. The above algorithms are all 
specialized black box techniques which do not use any database operations. Algorithms using only general 
purpose DBMS systems and relational algebra operations have also been proposed [5, 6]. There has been 
very limited work in parallel implementations of association algorithms. In [9] a parallel implementation of the DHP 
algorithm [8] is presented. However only simulation results on a shared nothing or distributed memory machine 
like IBM SP2 were presented. Parallel implementations .... 

M. Houtsma and A. Swami. Set-oriented mining of association rules. In RJ 9567. IBM Almaden, Oct. 1993. 



Discovering Association Rules based on Image Content - Carlos Ordonez (1999) (8 citations) (Correct) 

....content by Blobworld. 3.1. Data mining based on association rules At this point, we will consider in detail, the 
problem of finding associations. The problem of generating association rules was first introduced in [1] and an 
algorithm called AIS was proposed for mining all association rules. In [13], an algorithm called SETM was 
proposed to solve this problem using relational operations. In [2] two algorithms called Apriori and AprioriTid 
were proposed. These algorithms achieved significant improvements over the previous algorithms. The rule 
generation process was also extended to include .... 
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....) Hiking Boots and Clothes ) Hiking Boots may Clothes Jackets Footwear Ski Pants Shirts Outerwear Shoes 
Hiking Boots Figure 1: Example of a Taxonomy not. The former may not have minimum support, and the latter 
may not have minimum confidence. Earlier work on association rules [1] 2] [5] [6] 7] did not consider the 
presence of taxonomies and restricted the items in association rules to the leaf level items in the 
taxonomy. However, finding rules across different levels of the taxonomy is valuable since: ffl Rules at lower 
levels may not have minimum support. Few people may buy .... 

....TO, where T 0 contains all the items in T as well as all the ancestors of each items in T . For example, if the 
transaction contained Jackets, we would add Outerwear and Clothes to get the extended transaction. We can 
then run any of the algorithms for mining association rules [1] 2] [5] [6] 7] on the extended transactions to 
get generalized association rules. However, this Basic algorithm is not very fast; two more sophisticated 
algorithms that we propose run 2 to 5 times faster than Basic (and more than 100 times faster on one real life 
dataset) We describe the Basic .... 
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....a certain user specified frequency, called minimum support. The second step consists of forming implication 
rules among the large itemsets [3] In this paper we only deal with the computationally intensive first step. Many 
algorithms for finding large itemsets have been proposed in the literature [1, 7, 3, 10, 12, 6, 13, 2]. In this 
paper we will use the Apriori algorithm [2] to evaluate the effectiveness of sampling for data mining. We chose 
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Apriori since it fast and has excellent scale up properties. We would like to observe that our results are about 
sampling, and as such independent of the mining algorithm .... 

....algorithm [13] minimizes I O by scanning the database only twice. In the first pass it generates the set of all 
potentially large itemsets, and in the second pass their support is obtained. Algorithms using only general 
purpose DBMS systems and relational algebra operations have also been proposed [6, 7]. A theoretical 
analysis of sampling (using Chernoff bounds) for association rules was presented in [2, 10] We look at this 
problem in more detail empirically, and compare theory and experimentation. In [8] the authors compare sample 
selection schemes for data mining. They make a claim for .... 
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Multi-dimensional, inter-transaction association rules extend the traditional association 
rules to describe more general associations among items with multiple properties cross 
transactions. "After McDonald and Burger King open branches, KFC will open a branch two 
months later and one mile away" is an example of such rules. Since the number of 
potential Inter-transaction association rules tends to be extremely large, mining inter- 
transaction associations poses more chall ... 

2 A template model for multidimensional inter-transactional association rules 
Ling Feng, Jeffrey Xu Yu, Hongjun Lu, Jiawei Han 

October 2002 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 11 Issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(313.62 KB) Additional Information: full citation , abstract , index terms 

Multidimensional inter-transactional association rules extend the traditional association 
rules to describe more general associations among items with multiple properties across 
transactions. "After McDonald and Burger King open branches, KFC will open a branch two 
months later and one mile away" is an example of such rules. Since the number of 
potential inter-transactional association rules tends to be extremely large, mining inter- 
transactional associations poses more challe ... 

Keywords: Intra-transactional/inter-transactional association rules, Multidimensional 
context, Template model 
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A commercial Web page typically contains many information blocks. Apart from the main 
content blocks, it usually has such blocks as navigation panels, copyright and privacy 
notices, and advertisements (for business purposes and for easy user access). We call 
these blocks that are not the main content blocks of the page the noisy blocks. We show 
that the information contained in these noisy blocks can seriously harm Web data mining. 
Eliminating these noises is thus of great importance. In this pa ... 

Keywords: Web mining, noise detection, noise elimination 



Beyond intratransaction association analysis: mining multidimensional 
intertransaction association rules 
Hongjun Lu, Ling Feng, Jiawei Han 

October 2000 ACM Transactions on Information Systems (TOIS), Volume 18 issue 4 
Publisher: ACM Press 

Full text available- fg|pdf(1.31 MB) Additional Information: full citation , abstract , references , citings, index 
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In this paper, we extend the scope of mining association rules from traditional single- 
dimensional intratransaction associations, to multidimensional intertransaction 
associations. Intratransaction associations are the associations among items with the 
same transaction, where the notion of the transaction could be the items bought by the 
same customer, the events happened on the same day, and so on. However, an 
intertransaction association ... 

Keywords: association rules, data mining, intra/intertransaction, multidimensional 
context 
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of web page templates 
David Gibson, Kunal Punera, Andrew Tomkins 

May 2005 Special interest tracks and posters of the 14th international conference on 
World Wide Web 

Publisher: ACM Press 

Full text available: Tgjadf( 249.32 KB) , Additional Information: full citation, abstract, references, index terms 

Web pages contain a combination of unique content and template material, which is 
present across multiple pages and used primarily for formatting, navigation, and 
branding. We study the nature, evolution, and prevalence of these templates on the web. 
As part of this work, we develop new randomized algorithms for template extraction that 
perform approximately twenty times faster than existing approaches with similar quality. 
Our results show that 40—50% of the content on the web is templa ... 

Keywords: algorithms, boilerplate, data cleaning, data mining, templates, web mining 
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There are usually repetitive sub-segments in broadcast videos, which may be associated 
with high-level concepts or events, e.g., news footage, repeated scores in basketball. 
Unsupervised mining techniques provide generic solutions to discovering such temporal 
patterns in various video genres, which are currently the subject of great interests to 
researchers working on multimedia content analysis. In this paper, we propose a novel 
approach to automatically detecting repetitive patterns in a video ... 

Keywords: probabilistic template, repetitive pattern discovery, video mining 
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May 2002 Proceedings of the 11th international conference on World Wide Web 

Publisher: ACM Press 
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We formulate and propose the template detection problem, and suggest a practical 
solution for it based on counting frequent item sets. We show that the use of templates is 
pervasive on the web. We describe three principles, which characterize the assumptions 
made by hypertext information retrieval (IR) and data mining (DM) systems, and show 
that templates are a major source of violation of these principles. As a consequence, basic 
"pure" implementations of simple search algorithms coupled with t ... 

Keywords: data mining, hypertext, information retrieval, web searching 
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Publisher: Springer-Verlag New York, Inc. 

Full text available: ^[ pdf(281.14 KB) Additional Information: full citation , abstract , citin gs, index terms 

The analysis of web usage has mostly focused on sites composed of conventional static 
pages. However, huge amounts of information available in the web come from databases 
or other data collections and are presented to the users in the form of dynamically 
generated pages. The query interfaces of such sites allow the specification of many search 
criteria. Their generated results support navigation to pages of results combining cross- 
linked data from many sources. For the analysis of visitor naviga ... 

Keywords: Conceptual hierarchies, Data mining, Query capabilities, Web databases, Web 
query interfaces, Web usage mining 
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Full text available: ^ pdf(1 76.00 KB) Additional Information: full citation , abstract , references , index terms 

The paper presents a database intrusion detection algorithm based on user query 
frequent itemsets with item constraints. Firstly, the paper discusses the method to mine 
database user query frequent itemsets by using query template. Secondly, the paper 
discusses the constrained query templates that are used to reduce the number of frequent 
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itemsets and improve system performance. At last, the paper discusses the algorithm 
design and its application example. 

Keywords: data mining, database security, frequent itemsets, intrusion detection 



10 Background for association rules and cost estimate of selected mining algorithms 
Jia Liang Han, Ashley W. Plank 

November 1996 Proceedings of the fifth international conference on Information and 
knowledge management 

Publisher: ACM Press 

Full text available: f 9 pdf(928.44 KB) Additional Information: full citation , references , citings , index terms 



Keywords: algorithm analysis, association rules, background, data mining, memory 
management, optimization, scalability 



11 An object-oriented approach to multi-level association rule mining 
Scott Fortin, Ling Liu 

November 1996 Proceedings of the fifth international conference on Information and 
knowledge management 

Publisher: ACM Press 

Full text available: ^ pdf(996.66 KB) Additional Information: full citation , references , citings , index terms 



12 Bu g localization: PR-Miner: automatically extracting implicit programmin g rules and 
<g> detecting violations in large software code 
^ Zhenmin Li, Yuanyuan Zhou 

September 2005 Proceedings of the 10th European software engineering conference 
held jointly with 13th ACM SIGSOFT international symposium on 
Foundations of software engineering ESEC/FSE-13 
Publisher: ACM Press 

Full text available: ^ pdf(228.58 KB) Additional Information: full citation , abstract , references , index terms 

Programs usually follow many implicit programming rules, most of which are too tedious 
to be documented by programmers. When these rules are violated by programmers who 
are unaware of or forget about them, defects can be easily introduced. Therefore, it is 
highly desirable to have tools to automatically extract such rules and also to automatically 
detect violations. Previous work in this direction focuses on simple function-pair based 
programming rules and additionally requires programmer ... 

Keywords: automated specification generation, automated violation detection, data 
mining for software engineering, pattern recognition, programming rules, static analysis 



13 Editorial: special issue on web content mining 
Bing Liu, Kevin Chen-Chuan-Chang 

December 2004 ACM SIGKDD Explorations Newsletter volume 6 issue 2 
Publisher: ACM Press 

Full text available: ^| pdf(1 78.32 KB) Additional Information: full citation , abstract , references 

With the phenomenal growth of the Web, there is an everincreasing volume of data and 
information published in numerous Web pages. The research in Web mining aims to 
develop new techniques to effectively extract and mine useful knowledge or information 
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from these Web pages [8]. Due to the heterogeneity and lack of structure of Web data, 
automated discovery of targeted or unexpected knowledge/information is a challenging 
task. It calls for novel methods that draw from a wide range of fields spanni ... 

14 Intelligent jurisprudence research: a new concept 
Rosina Weber 

June 1999 Proceedings of the 7th international conference on Artificial intelligence 
and law 

Publisher: ACM Press 

Full text available* 1^3 df(1 20 MB) Additional Information: full citation , abstract , references , citings , index 
u e avai e.^jp terms 

Intelligent Jurisprudence Research (IJR) is a concept that consists in performing 
jurisprudence research with a computational tool that employs Artificial Intelligence (AI) 
techniques. Jurisprudence research is the search employed by judicial professionals when 
seeking for past legal situations that may be useful to a legal activity. When humans 
perform jurisprudence research, they employ analogical reasoning in comparing a given 
actual situation with past decisions, noting the affinities b ... 

Keywords: artificial intelligence and law, case-based reasoning, case-based retrieval, 
jurisprudence 



15 Exploratory mining and pruning optimizations of constrained associations rules j 
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD '98, Volume 27 issue 2 
Publisher: ACM Press 

Full text available* fill odf(1 65 MB) Additional Information: full citation , abstract , references , citings , index 
' terms 

From the standpoint of supporting human-centered discovery of knowledge, the present- 
day model of mining association rules suffers from the following serious shortcomings: (i) 
lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of 
relationships. In effect, this model functions as a black-box, admitting little user 
interaction in between. We propose, in this paper, an architecture that opens up the 
black-box, and supports constraint-based, human-centered explorat ... 

16 Towards on-line analytical minin g in large databases | 
Jtj^ Jiawei Han 

V March 1998 ACM SIGMOD Record, Volume 27 issue 1 
Publisher: ACM Press 

Full text available: ^ pdf(387.04 KB) Additional Information: full citation , abstract , citings, index terms 

Great efforts have been paid in the Intelligent Database Systems Research Lab for the 
research and development of efficient data mining methods and construction of on-line 
analytical data mining systems. Our work has been focused on the integration of data 
mining and OLAP technologies and the development of scalable, integrated, and multiple 
data mining functions. A data mining system, DBMiner, has been developed for interactive 
mining of multiple-level knowledge in large relational databases and ... 

17 Industrial/ g overnment track: Capturin g best practice for microarray g ene expression | 
^ data analysis 

^ Gregory Piatetsky-Shapiro, Tom Khabaza, Sridhar Ramaswamy 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 
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Full text available: Qpdf(874.16 KB) Additional Information: full citation , abstract , references , index terms 

Analyzing gene expression data from microarray devices has many important application 
in medicine and biology, but presents significant challenges to data mining. Microarray 
data typically has many attributes (genes) and few examples (samples), making the 
process of correctly analyzing such data difficult to formulate and prone to common 
mistakes. For this reason it is unusually important to capture and record good practices 
for this form of data mining. This paper presents a process for analyzi ... 

Keywords: Clementine, application template, data mining process, gene expression, 
microarrays 



18 Poster papers - short papers: Extracting unstructured data from template generated 
web documents 

Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Chung 

November 2003 Proceedings of the twelfth international conference on Information 

and knowledge management 
Publisher: ACM Press 

Full text available- fig |pdf(210 48 KB) Adc,itional Information: full citation , abstract , references , citings , index 

terms 

We propose a novel approach that identifies web page templates and extracts the 
unstructured data. Extracting only the body of the page and eliminating the template 
increases the retrieval precision for the queries that generate irrelevant results. We 
believe that by reducing the number of irrelevant results; the users are encouraged to go 
back to a given site to search. Our experimental results on several different web sites and 
on the whole cnnfn collection demonstrate the feasibility of our a ... 

Keywords: automatic template removal, information retrieval, retrieval accuracy, text 
extraction 



19 Research track poster: Simultaneous optimization of complex mining tasks with a 

knowledgeable cache 
Ruoming Jin, Kaushik Sinha, Gagan Agrawal 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: pdf(750.91 KB) Additional Information: full citation , abstract , references , index terms 

With an increasing use of data mining tools and techniques, we envision that a Knowledge 
Discovery and Data Mining System (KDDMS) will have to support and optimize for the 
following scenarios: 1) Sequence of Queries: A user may analyze one or more datasets by 
issuing a sequence of related complex mining queries, and 2) Multiple Simultaneous 
Queries: Several users may be analyzing a set of datasets concurrently, and may issue 
related complex queries.This paper presents a systematic ... 

Keywords: frequent pattern mining, knowledgeable cache, multiple query optimization 




20 Bug localization: DynaMine: finding common error patterns by mining software 




revision histories 

Benjamin Livshits, Thomas Zimmermann 

September 2005 Proceedings of the 10th European software engineering conference 
held jointly with 13th ACM SIGSOFT international symposium on 
Foundations of software engineering ESEC/FSE-13 




http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM&CFID=6575 2/23/06 



Results (page 1): mining templates 



Page 7 of 7 



Publisher: ACM Press 

Full text available: ^pdf (182.92 KB) Additional Information: full citation , abstract , references , index terms 

A great deal of attention has lately been given to addressing software bugs such as errors 
in operating system drivers or security bugs. However, there are many other lesser 
known errors specific to individual applications or APIs and these violations of application- 
specific coding rules are responsible for a multitude of errors. In this paper we propose 
DynaMine, a tool that analyzes source code check-ins to find highly correlated method 
calls as well as common bug fixes in order to automatical ... 

Keywords: coding patterns, data mining, dynamic analysis, error patterns, one-line 
check-ins, revision histories, software bugs 
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1 Design patterns for sorting 
Dung "Zung" Nguyen, Stephen B. Wong 

February 2001 ACM SIGCSE Bulletin , Proceedings of the thirty-second SIGCSE 

technical symposium on Computer Science Education SIGCSE '01, 

Volume 33 Issue 1 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 

Drawing on Merritt's divide-and-conquer sorting taxonomy [1], we model comparison- 
based sorting as an abstract class with a template method to perform the sort by 
relegating the splitting and joining of arrays to its concrete subclasses. Comparison on 
objects is carried out via an abstract ordering strategy. This reduces code complexity and 
simplifies the analyses of the various concrete sorting algorithms. Performance 
measurements and visualizations can be added without modifying any code by ut ... 

2 Fast detection of communication patterns in distributed executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Adva need 

Studies on Collaborative research 
Publisher: IBM Press 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 

3 Query evaluation techniques for large databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), Volume 25 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms , review 



Full text available: Qpdf(9.37MB) 



Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
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required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as 
efficiently as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 



4 XML query processing II: A comprehensive XQuery to SQL translation using dynamic 
^ interval encoding 

^ David DeHaan, David Toman, Mariano P. Consens, M. Tamer Ozsu 

June 2003 Proceedings of the 2003 ACM SIGMOD i nternational conference on 

Management of data 
Publisher: ACM Press 

Full text available* fi3 pdf(242 20 KB) Additional Information: full citation , abstract , references , citings, index 
' : terms 

The W3C XQuery language recommendation, based on a hierarchical and ordered 
document model, supports a wide variety of constructs and use cases. There is a diversity 
of approaches and strategies for evaluating XQuery expressions, in many cases only 
dealing with limited subsets of the language. In this paper we describe an implementation 
approach that handles XQuery with arbitrarily-nested FLWR expressions, element 
constructors and built-in functions (including structural comparisons). Our propos ... 

5 External memory algorithms and data structures: dealing with massive data 
Jeffrey Scott Vitter 

June 2001 ACM Computing Surveys (CSUR), Volume 33 issue 2 
Publisher: ACM Press 

Full text available* fi3 Ddf(828 46 KB) Additional Information: full citation , abstract , references , citings , index 
u v i terms 

Data sets in large applications are often too massive to fit completely inside the 
computers internal memory. The resulting input/output communication (or I/O) between 
fast internal memory and slower external memory (such as disks) can be a major 
performance bottleneck. In this article we survey the state of the art in the design and 
analysis of external memory (or EM) algorithms and data structures, where the goal is to 
exploit locality in order to reduce the I/O costs. We consider a varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external 
memory, hierarchical memory, multidimensional access methods, multilevel memory, 
online, out-of-core, secondary storage, sorting 



6 GPGPU: general purpose computation on gra phics hardware 

David Luebke, Mark Harris, Jens Kruger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff 
Woolley, Aaron Lefohn 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^ pdf(63.03 MB) Additional Information: full citation , abstract 

The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
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vertex and pixel processing. units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, GPUs are highly parallel s ... 

7 Ap plications: Dynamic coordination of information management services for 
processing dynamic web content 
In-Young Ko, Ke-Thia Yao, Robert Neches 

May 2002 Proceedings of the 11th international conference on World Wide Web 

Publisher: ACM Press 

Full text available* fi5 odfM 15 MB) Additional Information: full citation , abstract , references, citings , index 
" ^ terms 

Dynamic Web content provides us with time-sensitive and continuously changing data. To 
glean up-to-date information, users need to regularly browse, collect and analyze this 
Web content. Without proper tool support this information management task is tedious, 
time-consuming and error prone, especially when the quantity of the dynamic Web 
content is large, when many information management services are needed to analyze it, 
and when underlying services/network are not completely reliable. This pap ... 

Keywords: dynamic service coordination, dynamic web content, scalable component- 
based software systems, semantic interoperability, web information management systems 



Query optimization I: Access paths in the "Abe" statistical query facility 
Anthony Klug 

June 1982 Proceedings of the 1982 ACM SIGMOD i nternational conference on 
Management of data SIGMOD '82 

Publisher: ACM Press 

Full text available: ^ pdf(1.07 MB) Additional Information: full citation , abstract , references , citings 

An increasingly important part of information processing today involves the taking of 
counts, sums, averages, and other statistical or aggregate quantities. The "Abe" query 
language is designed to make formulation of complicated aggregations simple. Access 
path selection in Abe finds efficient ways to execute these complicated queries. Access 
paths for Abe queries perform "aggregate joins", that is, they compute aggregate 
quantities at the same time as they join subqueries with parent queries. T ... 

Level set and PDE methods for computer graphics 

David Breen, Ron Fedkiw, Ken Museth, Stanley Osher, Guillermo Sapiro, Ross Whitaker 
August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^| pdf(17.07 MB) Additional Information: full citation , abstract 

Level set methods, an important class of partial differential equation (PDE) methods, 
define dynamic surfaces implicitly as the level set (iso-surface) of a sampled, evolving nD 
function. The course begins with preparatory material that introduces the concept of using 
partial differential equations to solve problems in computer graphics, geometric modeling 
and computer vision. This will include the structure and behavior of several different types 
of differential equations, e.g. the level set eq ... 

10 S pecial issue on prototypes of deductive database systems: The aditi deductive 
database system 

Jayen Vaghani, Kotagiri Ramamohanarao, David B. Kemp, Zoltan Somogyi, Peter J. Stuckey, 
Tim S. Leask, James Harland 
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April 1994 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 3 Issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(2.67 MB) Additional Information: full citation , abstract , references , citings 

Deductive databases generalize relational databases by providing support for recursive 
views and non-atomic data. Aditi is a deductive system based on the client-server model; 
it is inherently multi-user and capable of exploiting parallelism on shared-memory 
multiprocessors. The back-end uses relational technology for efficiency in the 
management of disk-based data and uses optimization algorithms especially developed 
for the bottom-up evaluation of logical queries involving recursion. The front ... 

Keywords: implementation, logic, multi-user, parallelism, relational database 



11 Concurrent object-oriented programming | 
Gul Agha 

>S September 1990 Communications of the ACM, volume 33 issue 9 
Publisher: ACM Press 

Full text available: fg|pdf(2.06 MB) Additional Information: full citation , abstract, references, citings, index 
^ terms 

Three significant trends have underscored the central role of concurrency in computing. 
First, there is increased use of interacting processes by individual users, for example, 
application programs running on X windows. Second, workstation networks have become 
a cost-effective mechanism for resource sharing and distributed problem solving. For 
example, loosely coupled problems, such as finding all the factors of large prime numbers, 
have been solved by utilizing ideal cycles on networks of ... 

12 Computational strate g ies for object recognition | 
^ Paul Suetens, Pascal Fua, Andrew J. Hanson 

>r March 1992 ACM Computing Surveys (CSUR), Volume 24 issue l 

Publisher: ACM Press 

Full text available: ^ pdf(6.37MBl Additional ,nformation: citation > references , M» 

^ terms , review 

This article reviews the available methods for automated identification of objects in digital 
images. The techniques are classified into groups according to the nature of the 
computational strategy used. Four classes are proposed: (1) the simplest strategies, 
which work on data appropriate for feature vector classification, (2) methods that match 
models to symbolic data structures for situations involving reliable data and complex 
models, (3) approaches that fit models to the photometry and ... 

Keywords: image understanding, model-based vision, object recognition 



13 Graphical query interfaces for semistructured data: the QURSED system 
Michalis Petropoulos, Yannis Papakonstantinou, Vasilis Vassalos 
May 2005 ACM Transactions on Internet Technology (TOIT), volume 5 issue 2 

Publisher: ACM Press 

Full text available: ^ pdf(6.88MB) Additional Information: full citation , abstract , references , index terms 

We describe the QURSED system for the declarative specification and automatic 
generation of Web-based query forms and reports (QFRs) for semistructured XML data. In 
QURSED, a QFR is formally described by its query set specification (QSS) which captures 
the complex query and reporting capabilities of the QFR and the associations of the query 
set specification with visual elements that implement these capabilities on a Web page. 
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The design-time component of QURSE ... 

14 Concepts and paradigms of object-oriented programming 
A. Peter Wegner 

August 1990 ACM SIGPLAN OOPS Messenger, Volume l issue l 

Publisher: ACM Press 

Full text available: ^) pdf(5.52 MB) Additional Information: full citation , abstract , citin gs, index terms 

We address the following questions for object-oriented programming: What is it?What are 
its goals? What are its origins?What are its paradigms? What are its design alternatives? 
What are its models of concurrency?What are its formal computational models?What 
comes after object-oriented programming?5tart\ng from software engineering goals, we 
examine the origins and paradigms of object-oriented programming, explore its language 
design alternativ ... 

15 XML parsing and stylesheets: Compiling XSLT 2.0 into XQuery 1.0 
Achille Fokoue, Kristoffer Rose, Jerome Simeon, Lionel Villard 

May 2005 Proceedings of the 14th international conference on World Wide Web 
Publisher: ACM Press 

Full text available: ^pdf(143.18 KB) Additional Information: full citation , abstract , references , index terms 

As XQuery is gathering momentum as the standard query language for XML, there is a 
growing interest in using it as an integral part of the XML application development 
infrastructure. In that context, one question which is often raised is how well XQuery 
interoperates with other XML languages, and notably with XSLT. XQuery 1.0 [16] and 
XSLT 2.0 [7] share a lot in common: they share XPath 2.0 as a common sub-language 
and have the same expressiveness. However, they are based on fairly different pro ... 

Keywords: Web services, XML, XQuery, XSLT 



16 WSQ/DSQ: a practical approach for combined querying of databases and the Web 
A Roy Goldman, Jennifer Widom 

^ May 2000 ACM SIGMOD Record , Proceedings of the 2000 ACM SIGMOD international 
conference on Management of data SIGMOD 'OO, Volume 29 issue 2 
Publisher: ACM Press 

Full text available- flBpdW223.6S KB) Additional Information: full citation , abstract, references, dtings, index 
l£±H ^ terms 

We present WSQ/DSQ (pronounced "wisk-disk"), a new approach for combining the query 
facilities of traditional databases with existing search engines on the Web. WSQ, for Web- 
Supported (Database) Queries, leverages results from Web searches to enhance SQL 
queries over a relational database. DSQ, for Database-Supported (Web) Queries, uses 
information stored in the database to enhance and explain Web searches. This paper 
focuses primarily on WSQ, describing a simple, lo ... 

17 Research sessions: XML II: Approximate XML joins 
Sudipto Guha, H. V. Jagadish, Nick Koudas, Divesh Srivastava, Ting Yu 
June 2002 Proceedings of the 2002 ACM SIGMOD i nternational conference on 

Management of data SIGMOD '02 
Publisher: ACM Press 

Full text available- pdfd 25 MB) Additional Information: full citation , abstract , references , citings , index 
' ^ terms 

XML is widely recognized as the data interchange standard for tomorrow, because of its 
ability to represent data from a wide variety sources. Hence, XML is likely to be the 
format through which data from multiple sources is integrated. In this paper we study the 
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problem of integrating XML data sources through correlations realized as join operations. 
A challenging aspect of this operation is the XML document structure. Two documents 
might convey approximately or exactly the same information but m ... 

18 Optimizing object queries using an effective calculus j 
Leonidas Fegaras, David Maier 

December 2000 ACM Transactions on Database Systems (TODS), Volume 25 issue 4 
Publisher: ACM Press 

Full text available: ff) P df(641.65 KB) Additional Information: full citation , abstract, references , citings, index 
^ terms , review 

Object-oriented databases (OODBs) provide powerful data abstractions and modeling 
facilities, but they generally lack a suitable framework for query processing and 
optimization. The development of an effective query optimizer is one of the key factors for 
OODB systems to successfully compete with relational systems, as well as to meet the 
performance requirements of many nontraditional applications. We propose an effective 
framework with a solid theoretical basis for optimizing OODB query I ... 

Keywords: nested relations, object-oriented databases, query decorrelation, query 
optimization 



19 Model-based recognition in robot vision j 
Roland T. Chin, Charles R. Dyer 

March 1986 ACM Computing Surveys (CSUR), Volume 18 issue l 
Publisher: ACM Press 

Full text available- fiH pdf(4 94 MB) Additional Information: full citation , abstract , references , citings , index 
• lAj h terms , review 

This paper presents a comparative study and survey of model-based object-recognition 
algorithms for robot vision. The goal of these algorithms is to recognize the identity, 
position, and orientation of randomly oriented industrial parts. In one form this is 
commonly referred to as the "bin-picking" problem, in which the parts to be recognized 
are presented in a jumbled bin. The paper is organized according to 2-D, 2V2-D, and 3-D 
object representations, which are used as the basis for ... 

20 Query o ptimization in the presence of limited access patterns \ 
Daniela Florescu, Alon Levy, Ioana Manolescu, Dan Suciu 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, volume 28 issue 2 
Publisher: ACM Press 

Full text available* odfd 66 MB) Additional Information: full citation , abstract , references , citings, index 
^^—^ terms 

We consider the problem of query optimization in the presence of limitations on access 
patterns to the data (i.e., when one must provide values for one of the attributes of a 
relation in order to obtain tuples). We show that in the presence of limited access patterns 
we must search a space of annotated query plans, where the annotations describe the 
inputs that must be given to the plan. We describe a theoretical and experimental 
analysis of the resulting search space and a ... 
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1 Data mining: Mining the peanut gallery: opinion extraction and semantic classification 
<g> of product reviews 

^ Kushal Dave, Steve Lawrence, David M. Pennock 

May 2003 Proceedings of the 12th international conference on World Wide Web 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
terms 



Full text available: pdf(327.95 KB) 



The web contains a wealth of product reviews, but sifting through them is a daunting 
task. Ideally, an opinion mining tool would process a set of search results for a given 
item, generating a list of product attributes (quality, features, etc.) and aggregating 
opinions about each of them (poor, mixed, good). We begin by identifying the unique 
properties of this problem and develop a method for automatically distinguishing between 
positive and negative reviews. Our classifier draws on information ... 



Keywords: document classification, opinion mining 
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re petitive patterns in broadcast videos 
Peng Wang, Zhi-Qiang Liu, Shi-Qiang Yang 

November 2005 Proceedings of the 13th annual ACM international conference on 
Multimedia MULTIMEDIA 05 

Publisher: ACM Press 

Full text available: ^pdfd 63.58 KB) Additional Information: full citation , abstract , references , index terms 

There are usually repetitive sub-segments in broadcast videos, which may be associated 
with high-level concepts or events, e.g., news footage, repeated scores in basketball. 
Unsupervised mining techniques provide generic solutions to discovering such temporal 
patterns in various video genres, which are currently the subject of great interests to 
researchers working on multimedia content analysis. In this paper, we propose a novel 
approach to automatically detecting repetitive patterns in a video ... 

Keywords: probabilistic template, repetitive pattern discovery, video mining 
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November 1997 Proceedings of the 1997 conference of the Centre for Advanced 

Studies on Collaborative research 
Publisher: IBM Press 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 

4 Mining semantics for large scale integration on the web: evidences, insights, and 
^ challenges 

^ Kevin Chen-Chuan Chang, Bin He, Zhen Zhang 

December 2004 ACM SIGKDD Explorations Newsletter Volume 6 issue 2 

Publisher: ACM Press 

Full text available: ^ pdf(466.64 KB) Additional Information: full citation , abstract , references 

The Web has been rapidly "deepened" — with myriad searchable databases online, where 
data are hidden behind query interfaces. Toward large scale integration over this "deep 
Web," we are facing a new challenge- With its dynamic and ad-hoc nature, such large 
scale integration mandates dynamic semantics discovery. That is, we must on-the-fly 
cope with "semantics" of dynamically discovered sources without pre-configured source- 
specific knowledge. To tackle this challenge, our initial ... 

5 Bug localization: DynaMine: finding common error patterns by mining software 
revision histories 

Benjamin Livshits, Thomas Zimmermann 

September 2005 Proceedings of the 10th European software engineering conference 
held jointly with 13th ACM SIGSOFT international symposium on 
Foundations of software engineering ESEC/FSE-13 
Publisher: ACM Press 

Full text available: ^pdf d 82.92 KB) Additional Information: full citation , abstract , references , index terms 

A great deal of attention has lately been given to addressing software bugs such as errors 
in operating system drivers or security bugs. However, there are many other lesser 
known errors specific to individual applications or APIs and these violations of application- 
specific coding rules are responsible for a multitude of errors. In this paper we propose 
DynaMine, a tool that analyzes source code check-ins to find highly correlated method 
calls as well as common bug fixes in order to automatical ... 

Keywords: coding patterns, data mining, dynamic analysis, error patterns, one-line 
check-ins, revision histories, software bugs 
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Exploration mining in diabetic patients databases: findings and conclusions 
Wynne Hsu, Mong Li Lee, Bing Liu, Tok Wang Ling 

August 2000 Proceedings of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: ^ pdfd 36.53 KB) Additional Information: full citation , references , citings , index terms 
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expression patterns 
Daxin Jiang, Jian Pei, Aidong Zhang 

December 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue 2 
Publisher: ACM Press 

Full text available: Q pdf(527.68 KB) Additional Information: full citation , abstract , references 

Analyzing coherent gene expression patterns is an important task in bioinformatics 
research and biomedical applications. Recently, various clustering methods have been 
adapted or proposed to identify clusters of co-expressed genes and recognize coherent 
expression patterns as the centroids of the clusters. However, the interpretation of co- 
expressed genes and coherent patterns mainly depends on the domain knowledge, which 
presents several challenges for coherent pattern mining and cannot be solv ... 

8 Bu g localization: PR-Miner: automatically extracting implicit programming rules and Q 
detecting violations in large software code 
Zhenmin Li, Yuanyuan Zhou 

September 2005 Proceedings of the 10th European software engineering conference 
held jointly with 13th ACM SIGSOFT international symposium on 
Foundations of software engineering ESEC/FSE-13 
Publisher: ACM Press 

Full text available: ^ pdf(228.58 KB) Additional Information: full citation, abstract , references , index terms 

Programs usually follow many implicit programming rules, most of which are too tedious 
to be documented by programmers. When these rules are violated by programmers who 
are unaware of or forget about them, defects can be easily introduced. Therefore, it is 
highly desirable to have tools to automatically extract such rules and also to automatically 
detect violations. Previous work in this direction focuses on simple function-pair based 
programming rules and additionally requires programmer ... 

Keywords: automated specification generation, automated violation detection, data 
mining for software engineering, pattern recognition, programming rules, static analysis 





9 Poster papers: Findin g surprising patterns in a time series database in linear time Q 
^ and space 

^ Eamonn Keogh, Stefano Lonardi, Bill *Yuan-chi' Chiu 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available: fijH pdf(686.15 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

The problem of finding a specified pattern in a time series database (i.e. query by 
content) has received much attention and is now a relatively mature field. In contrast, the 
important problem of enumerating all surprising or interesting patterns has received far 
less attention. This problem requires a meaningful definition of "surprise", and an efficient 
search technique. All previous attempts at finding surprising patterns in time series use a 
very limited notion of surprise, and/or do not sc ... 

Keywords: Markov Model, anomaly detection, feature extraction, novelty detection, 
suffix tree, time series 
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^ Jia Liang Han, Ashley W. Plank 

November 1996 Proceedings of the fifth international conference on Information and 
knowledge management 
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Keywords: algorithm analysis, association rules, background, data mining, memory 
management, optimization, scalability 



11 Patterns and aspects: Detecting higher-level similarity patterns in programs 
Hamid Abdul Basit, Stan Jarzabek 

September 2005 Proceedings of the 10th European software engineering conference 
held jointly with 13th ACM SIGSOFT international symposium on 
Foundations of software engineering ESEC/FSE-13 
Publisher: ACM Press 

Full text available: ^ pdf(641,23 KB) Additional Information: full citation , abstract , references , index terms 

Cloning in software systems is known to create problems during software maintenance. 
Several techniques have been proposed to detect the same or similar code fragments in 
software, so-called simple clones. While the knowledge of simple clones is useful, 
detecting design-level similarities in software could ease maintenance even further, and 
also help us identify reuse opportunities. We observed that recurring patterns of simple 
clones - so-called structural clones - often indicate ... 

Keywords: clone detection, similarity patterns, software clones 



12 Industrial and practical experience track pa per session 1: The volume and evolution Q 
of web page templates 
David Gibson, Kunal Punera, Andrew Tomkins 

May 2005 Special interest tracks and posters of the 14th international conference on 
World Wide Web 

Publisher: ACM Press 

Full text available: ^| pdf(249.32 KB) Additional Information: full citation , abstract , references , index terms 

Web pages contain a combination of unique content and template material, which is 
present across multiple pages and used primarily for formatting, navigation, and 
branding. We study the nature, evolution, and prevalence of these templates on the web. 
As part of this work, we develop new randomized algorithms for template extraction that 
perform approximately twenty times faster than existing approaches with similar quality. 
Our results show that 40—50% of the content on the web is templa ... 

Keywords: algorithms, boilerplate, data cleaning, data mining, templates, web mining 




13 Web and e-business a p plication: A pplication run ti me estimation: a quality of service Q 

metric for web-based data mining services 
^ Shonali Krishnaswamy, Seng Wai Loke, Arkady Zaslavsky 

March 2002 Proceedings of the 2002 ACM symposium on Applied computing 

Publisher: ACM Press 

Full text available* -fx l pdf(685.04 KB) Addit ' onal Information: full citation , abstract , references , citings , index 

terms 

The emergence of Application Service Providers (ASP) hosting Internet-based data mining 
services is being seen as a viable alternative for organisations that value their knowledge 
resources but are constrained by the high cost of data mining software. Response time is 
an important Quality of Service (QoS) metric for web-based data mining service 
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providers. The ability to estimate the response time of data mining algorithms apriori 
benefits both clients and service providers. The advantage for th ... 

Keywords: application run time estimation, data mining e-services, quality of service, 
rough sets 



14 Emergent web patterns: The connectivity sonar: detecting site functionality by 
^ structural patterns 

^ Einat Amitay, David Carmel, Adam Darlow, Ronny Lempel, Aya Soffer 

August 2003 Proceedings of the fourteenth ACM conference on Hypertext and 

hypermedia 
Publisher: ACM Press 

Full text available: fiDpdf(153.40 KB) Additional Information: full citation , abstract, references , citings, index 
^ terms 

Web sites today serve many different functions, such as corporate sites, search engines, 
e-stores, and so forth. As sites are created for different purposes, their structure and 
connectivity characteristics vary. However, this research argues that sites of similar role 
exhibit similar structural patterns, as the functionality of a site naturally induces a typical 
hyperlinked structure and typical connectivity patterns to and from the rest of the Web. 
Thus, the functionality of Web sites is refle ... 

Keywords: link analysis, web IR, web graphs 



15 Se quence mining in cate g orical domains: incorporatin g constraints 
Mohammed J. Zaki 

November 2000 Proceedings of the ninth international conference on Information and 
knowledge management 

Publisher: ACM Press 

Full text available: ^ pdf(622,26 KB) Additional Information: full citation , references , citings , index terms 




16 Computing curricula 2001 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Publisher: ACM Press 

Full text available: fi £|pdf(613.63 KB) 
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17 Research track: Eliminating noisy information in Web pages for data mining 
Lan Yi, Bing Liu, Xiaoli LI 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: ffl Pdf(468.87 KB) Additjonal Information: full citation, abstract, references , cjtings, index 
k- 1 terms 

A commercial Web page typically contains many information blocks. Apart from the main 
content blocks, it usually has such blocks as navigation panels, copyright and privacy 
notices, and advertisements (for business purposes and for easy user access). We call 
these blocks that are not the main content blocks of the page the noisy blocks. We show 
that the information contained In these noisy blocks can seriously harm Web data mining. 
Eliminating these noises is thus of great importance. In this pa ... 
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18 Web services and performance evaluation: Indexing web access-logs for pattern Q 
<g> queries 

^ Alexandras Nanopoulos, Yannis Manolopoulos, Maciej Zakrzewicz, Tadeusz Morzy 

November 2002 Proceedings of the 4th international workshop on Web information 

and data management 
Publisher: ACM Press 

Full text available: ^?| pdf( 187.24 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we develop a new indexing method for large web access-logs. We are 
concerned with pattern queries, which advocate the search for access sequences that 
contain certain query patterns. This kind of queries find applications in processing web-log 
mining results (e.g., finding typical/atypical access-sequences). The proposed method 
focuses on scalability to web-logs' sizes. For this reason, we examine the gains due to 
signature-trees, which can further improve the scalability to very ... 

19 Knowledge discovery in data warehouses Q 
^ Themistoklis Palpanas 

>/ September 2000 ACM SIGMOD Record, volume 29 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(240.77 KB) Additional Information: full citation , abstract , citings , index terms 

As the size of data warehouses increase to several hundreds of gigabytes or terabytes, 
the need for methods and tools that will automate the process of knowledge extraction, or 
guide the user to subsets of the dataset that are of particular interest, is becoming 
prominent. In this survey paper we explore the problem of identifying and extracting 
interesting knowledge from large collections of data residing in data warehouses, by using 
data mining techniques. Such techniques have the ability to i ... 

20 Sequence Mining: Efficient and robust feature extraction and pattern matching of time jjjj 
^ series by a lattice structure 

^ Polly Wan Po Man, Man Hon Wong 

October 2001 Proceedings of the tenth international conference on Information and 

knowledge management 
Publisher: ACM Press 

Full text available: g| pdf(1.48 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

The efficiency of searching scaling-invariant and shifting-invariant shapes in a set of 
massive time series data can be improved if searching is performed on an approximated 
sequence which involves less data but contains all the significant features. However, 
commonly used smoothing techniques, such as moving averages and best-fitting 
polylines, usually miss important peaks and troughs and deform the time series. In 
addition, these techniques are not robust, as they often requires users to suppl ... 
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Exploratory mining and pruning optimizations of constrained associations rules 
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD '98, Volume 27 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: pdfd.65 MB) 



From the standpoint of supporting human-centered discovery of knowledge, the present- 
day model of mining association rules suffers from the following serious shortcomings: (i) 
lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of 
relationships. In effect, this model functions as a black-box, admitting little user 
interaction in between. We propose, in this paper, an architecture that opens up the 
black-box, and supports constraint-based, human-centered explorat ... 



2 Multi Relational Data Mining (MRDM): Scalability and efficiency in multi-relational 
^ data mining 

^ Hendrik Blockeel, Michele Sebag 

July 2003 ACM SIGKDD Explorations Newsletter, Volume 5 issue l 
Publisher: ACM Press 

Full text available: ^pdf(1.61 MB) Additional Information: full citation , abstract , references , citings 

Efficiency and Scalability have always been important concerns in the field of data mining, 
and are even more so in the multi-relational context, which is inherently more complex. 
The issue has been receiving an increasing amount of attention during the last few years, 
and quite a number of theoretical results, algorithms and implementations have been 
presented that explicitly aim at improving the efficiency and Scalability of multi-relational 
data mining approaches. With this article we attempt ... 



3 Pattern discovery and forecasting: Mining sequential patterns with constraints in large Q 
databases 

Jian Pei, Jiawei Han, Wei Wang 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: fQpdf(252.19 KB) Additional Information: full citation , abstract , references , citings, index 
^ 5 terms 

Constraints are essential for many sequential pattern mining applications. However, there 
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is no systematic study on constraint-based sequential pattern mining. In this paper, we 
investigate this issue and point out that the framework developed for constrained 
frequent-pattern mining does not fit our missions well. An extended framework is 
developed based on a sequential pattern growth methodology. Our study shows that 
constraints can be effectively and efficiently pushed deep into ... 

4 A template model for multidimensional inter-transactional association rules 
Ling Feng, Jeffrey Xu Yu, Hongjun Lu, Jiawei Han 

October 2002 The VLDB Journal - The International Journal on Very Large Data 

Bases, Volume 11 Issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ [pdf(313.62 KB) Additional Information: full citation , abstract , index terms 

Multidimensional inter-transactional association rules extend the traditional association 
rules to describe more general associations among items with multiple properties across 
transactions. "After McDonald and Burger King open branches, KFC will open a branch two 
months later and one mile away" is an example of such rules. Since the number of 
potential inter-transactional association rules tends to be extremely large, mining inter- 
transactional associations poses more challe ... 

Keywords: Intra-transactional/inter-transactional association rules, Multidimensional 
context, Template model 



O ptimization of constrained frequent set queries with 2-variable constraints 
Laks V. S. Lakshmanan, Raymond Ng, Jiawei Han, Alex Pang 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, volume 28 issue 2 
Publisher: ACM Press 

Full text available* f5H pdf(1.65 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Currently, there is tremendous interest in providing ad-hoc mining capabilities in database 
management systems. As a first step towards this goal, in [15] we proposed an 
architecture for supporting constraint-based, human-centered, exploratory mining of 
various kinds of rules including associations, introduced the notion of constrained frequent 
set queries (CFQs), and developed effective pruning optimizations for CFQs with 1- 
variable (1-var) constraints. While 1-var constraints a ... 



6 Scalable data mining with model constraints 
Minos Garofalakis, Rajeev Rastogi 

December 2000 ACM SIGKDD Explorations Newsletter, Volume 2 issue 2 
Publisher: ACM Press 

Full text available: ^pdf(1.15 MB) Additional Information: full citation , index terms 




Keywords: data mining constraints, decision tables, sequential patterns 



7 Mining the most interesting rules 
Roberto J. Bayardo, Rakesh Agrawal 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: ^ pdf(1 .29 MB) Additional Information: full citation , references , citing s, index terms 
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Tri- plots: scalable tools for multidimensional data minin g 
Agma Traina, Caetano Traina, Spiros Papadimitriou, Christos Faloutsos 
August 2001 Proceedings of the seventh ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available: gj pdf<883.71 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

We focus on the problem of finding patterns across two large, multidimensional datasets. 
For example, given feature vectors of healthy and of non-healthy patients, we want to 
answer the following questions: Are the two clouds of points separable? What is the 
smallest/largest pair-wise distance across the two datasets? Which of the two clouds does 
a new point (feature vector) come from?We propose a new tool, the tri-plot, and its 
generalization, the pq-plot, which help us answer the ... 

9 Research track poster: Integration of profile hidden Markov model output into 
<g> association rule mining 

Christopher Besemann, Anne Denton 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: ^] pdf(794.15 KB) Additional Information: full citation , abstract , references , index terms 

Scientific models typically depend on parameters. Preserving the parameter dependence 
of models in the pattern mining context opens up several applications. Within association 
rule mining (ARM), the choice of parameters can be studied with more flexibly then in 
traditional model building. Studying support, confidence, and other rule metrics as a 
function of model parameters allows conclusions on assumptions underlying the models. 
We present efficient techniques to handle multiple model output da ... 

Keywords: association rule mining, model mining, profile hidden Markov model 



10 Multi Relational Data Mining (MRDM): State of the art of graph-based data mining 
A. Takashi Washio, Hiroshi Motoda 

V July 2003 ACM SIGKDD Explorations Newsletter, Volume 5 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(1.20 MB) Additional Information: full citation , abstract , references , citings 

The need for mining structured data has increased in the past few years. One of the best 
studied data structures in computer science and discrete mathematics are graphs. It can 
therefore be no surprise that graph based data mining has become quite popular in the 
last few years. This article introduces the theoretical basis of graph based data mining and 
surveys the state of the art of graph-based data mining. Brief descriptions of some 
representative approaches are provided as well. 

Keywords: data mining, graph, graph-based data mining, path, structured data, tree 



11 The segment support map: scalable mining of frequent itemsets ■ 
^ Laks V. S. Lakshmanan, Carson Kai-Sang Leung, Raymond T. Ng 
V December 2000 ACM SIGKDD Explorations Newsletter, volume 2 issue 2 
Publisher: ACM Press 

Full text available: Qpdf(712.50 KB) Additional Information: full citation , citings , index terms 
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12 Research track paper: On mining cross-graph quasi-cliques 
Jian Pei, Daxin Jiang, Aidong Zhang 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: Q pdf(573.85 KB) Additional Information: full citation , abstract , references , index terms 

Joint mining of multiple data sets can often discover interesting, novel, and reliable 
patterns which cannot be obtained solely from any single source. For example, in cross- 
market customer segmentation, a group of customers who behave similarly in multiple 
markets should be considered as a more coherent and more reliable cluster than clusters 
found in a single market. As another example, in bioinformatics, by joint mining of gene 
expression data and protein interaction data, we can find cluster ... 

Keywords: bioinformatics, graph mining, patterns 



13 Research track posters: Generalizing the notion of support 
Michael Steinbach, Pang-Ning Tan, Hui Xiong, Vipin Kumar 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available: ^ pdf(1 94.35 KB) Additional Information: full citation , abstract , references , index terms 

The goal of this paper is to show that generalizing the notion of support can be useful in 
extending association analysis to non-traditional types of patterns and non-binary data. 
To that end, we describe a framework for generalizing support that is based on the 
simple, but useful observation that support can be viewed as the composition of two 
functions: a function that evaluates the strength or presence of a pattern in each object 
(transaction) and a function that summarizes these evaluations ... 

Keywords: association analysis, hyperclique, support 



14 A perspective on inductive databases 
^ Luc De Raedt 

December 2002 ACM SIGKDD Explorations Newsletter, volume 4 issue 2 
Publisher: ACM Press 

Full text available: ^ pdfd 79.64 KB) Additional Information: full citation , abstract , references , citings 

Inductive databases tightly integrate databases with data mining. The key ideas are that 
data and patterns (or models) are handled in the same way and that an inductive query 
language allows the user to query and manipulate the patterns (or models) of 
interest.This paper proposes a simple and abstract model for inductive databases. We 
describe the basic formalism, a simple but fairly powerful inductive query language, some 
basics of reasoning for query optimization, and discuss some memory organ ... 

Keywords: constraint-based mining, inductive databases, inductive querying 

15 Scalable algorithms for mining large databases 
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August 1999 Tutorial notes of the fifth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available: ^pdf(4.11 MB) Additional Information: full citation , references , citings , index terms 



16 Frequent patterns I: Selecting the right interestingness measure for association 
^ patterns 

^ Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available* f£\ pdf(937 37 KB) Additional Information: full citation , abstract , references , citings , index 
^ '' terms , review 

Many techniques for association rule mining and feature selection require a suitable 
metric to capture the dependencies among variables in a data set. For example, metrics 
such as support, confidence, lift, correlation, and collective strength are often used to 
determine the interestingness of association patterns. However, many such measures 
provide conflicting information about the interestingness of a pattern, and the best metric 
to use for a given application domain is rarely known. In this ... 

Keywords: associations, contingency tables, interestingness measure 



17 Logical foundations of object-oriented and frame-based languages 
Michael Kifer, Georg Lausen, James Wu 
July 1995 Journal of the ACM (JACM), Volume 42 Issue 4 
Publisher: ACM Press 

Full text available: 1 ?| pdf(7.52 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms , review 

We propose a novel formalism, called Frame Logic (abbr., F-logic), that accounts in a 
clean and declarative fashion for most of the structural aspects of object-oriented and 
frame-based languages. These features include object identity, complex objects, 
inheritance, polymorphic types, query methods, encapsulation, and others. In a sense, F- 
logic stands in the same relationship to the object-oriented paradigm as classical 
predicate calculus stands to relational programming. ... 

Keywords: deductive databases, frame-based languages, logic programming, 
nonmonotonic inheritance, object-oriented programming, proof theory, semantics, typing 



18 Exploratory mining via constrained frequent set queries 
Raymond Ng, Laks V. S. Lakshmanan, Jiawei Han, Teresa Mah 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, Volume 28 issue 2 
Publisher: ACM Press 

Full text available- fiB pdf( 353 58 KB) Additjonal Information: full citation , abstract , references , citings , index 
^ terms 

Although there have been many studies on data mining, to date there have been few 
research prototypes or commercial systems supporting comprehensive query-driven 
mining, which encourages interactive exploration of the data. Our thesis is that constraint 
constructs and the optimization they induce play a pivotal role in mining queries, thus 
substantially enhancing the usefulness and performance of the mining system. This is 
based on the analogy of declarative query languages like SQL and que ... 
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19 Multi Relational Data Mining (MRDM): Biological applications of multi-relational data Q 
^ mining 

^ David Page, Mark Craven 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(1.12 MB) Additional Information: full citation , abstract , references , citings 

Biological databases contain a wide variety of data types, often with rich relational 
structure. Consequently multi-relational data mining techniques frequently are applied to 
biological data. This paper presents several applications of multi-relational data mining to 
biological data, taking care to cover a broad range of multi-relational data mining 
techniques. 
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Exploratory mining and pruning optimizations of constrained associations rules 
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD '98, Volume 27 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
terms 



Full text available: ^pdf(1 .65 MB) 



From the standpoint of supporting human-centered discovery of knowledge, the present- 
day model of mining association rules suffers from the following serious shortcomings: (i) 
lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of 
relationships. In effect, this model functions as a black-box, admitting little user 
interaction in between. We propose, in this paper, an architecture that opens up the 
black-box, and supports constraint-based, human-centered explorat ... 



2 Multi Relational Data Minin g (MRDM): Scalability and efficiency in multi-relational jj 
data mining 

Hendrik Blockeel, Michele Sebag 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 
Publisher: ACM Press 

Full text available: ^pdf(1.61 MB) Additional Information: full citation , abstract , references , citings 

Efficiency and Scalability have always been important concerns in the field of data mining, 
and are even more so in the multi-relational context, which is inherently more complex. 
The issue has been receiving an increasing amount of attention during the last few years, 
and quite a number of theoretical results, algorithms and implementations have been 
presented that explicitly aim at improving the efficiency and Scalability of multi-relational 
data mining approaches. With this article we attempt ... 

3 Pattern discovery and forecasting: Minin g sequential patterns with constraints in large J 
databases 

Jian Pei, Jiawei Han, Wei Wang 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: fB pdf(252.19 KB^ Additional Information: full citation , abstract , references , citings, index 

terms 

Constraints are essential for many sequential pattern mining applications. However, there 
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is no systematic study on constraint-based sequential pattern mining. In this paper, we 
investigate this issue and point out that the framework developed for constrained 
frequent-pattern mining does not fit our missions well. An extended framework is 
developed based on a sequential pattern growth methodology. Our study shows that 
constraints can be effectively and efficiently pushed deep into ... 

4 A template model for multidimensional inter-transactional association rules 
Ling Feng, Jeffrey Xu Yu, Hongjun Lu, Jiawei Han 

October 2002 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 11 issue 2 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^pdf(313.62 KB) Additional Information: full citation , abstract , index terms 

Multidimensional inter-transactional association rules extend the traditional association 
rules to describe more general associations among items with multiple properties across 
transactions. "After McDonald and Burger King open branches, KFC will open a branch two 
months later and one mile away" is an example of such rules. Since the number of 
potential inter-transactional association rules tends to be extremely large, mining inter- 
transactional associations poses more challe ... 

Keywords: Intra-transactional/inter-transactional association rules, Multidimensional 
context, Template model 



Optimization of constrained frequent set queries with 2-variable constraints 
Laks V. S. Lakshmanan, Raymond Ng, Jiawei Han, Alex Pang 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data SIGMOD '99, Volume 28 issue 2 
Publisher: ACM Press 

Full text available* 151 pdfd 65 MB) Additional Information: full citation , abstract , references , citings , index 
' 1^-^ terms 

Currently, there is tremendous interest in providing ad-hoc mining capabilities in database 
management systems. As a first step towards this goal, in [15] we proposed an 
architecture for supporting constraint-based, human-centered, exploratory mining of 
various kinds of rules including associations, introduced the notion of constrained frequent 
set queries (CFQs), and developed effective pruning optimizations for CFQs with 1- 
variable (1-var) constraints. While 1-var constraints a ... 

Scalable data mining with model constraints j 
Minos Garofalakis, Rajeev Rastogi 

December 2000 ACM SIGKDD Explorations Newsletter volume 2 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(1.15MB) Additional Information: full citation , index terms 



Keywords: data mining constraints, decision tables, sequential patterns 



Mining the most interesting rules 
Roberto J. Bayardo, Rakesh Agrawal 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: ^ pdfd .29 MB) Additional Information: full citation , references , citings , index terms 
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Tri- plots: scalable tools for multidimensional data mining 
Agma Traina, Caetano Traina, Spiros Papadimitriou, Christos Faloutsos ; 
August 2001 Proceedings of the seventh ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available* fiQ pdf(883 71 KB) Addit ' onal Information: full citation , abstract , references , citings , index 
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We focus on the problem of finding patterns across two large, multidimensional datasets. 
For example, given feature vectors of healthy and of non-healthy patients, we want to 
answer the following questions: Are the two clouds of points separable? What is the 
smallest/largest pair-wise distance across the two datasets? Which of the two clouds does 
a new point (feature vector) come from?We propose a new tool, the tri-plot, and its 
generalization, the pq-plot, which help us answer the ... 

9 Research track poster: Integration of profile hidden Markov model output into 
<g> association rule mining 
^ Christopher Besemann, Anne Denton 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 

Knowledge discovery in data mining KDD '05 
Publisher: ACM Press 

Full text available: ^pdf(794.15 KB) Additional Information: full citation , abstract , references , index terms 

Scientific models typically depend on parameters. Preserving the parameter dependence 
of models in the pattern mining context opens up several applications. Within association 
rule mining (ARM), the choice of parameters can be studied with more flexibly then in 
traditional model building. Studying support, confidence, and other rule metrics as a 
function of model parameters allows conclusions on assumptions underlying the models. 
We present efficient techniques to handle multiple model output da ... 

Keywords: association rule mining, model mining, profile hidden Markov model 



10 Multi Relational Data Minin g (MRDM): State of the art of graph-based data mining 

Takashi Washio, Hiroshi Motoda 
V July 2003 ACM SIGKDD Explorations Newsletter, Volume 5 issue l 

Publisher: ACM Press 

Full text available: ^ pdf(1.20MB) Additional Information: full citation , abstract , references , citings 

The need for mining structured data has increased in the past few years. One of the best 
studied data structures in computer science and discrete mathematics are graphs. It can 
therefore be no surprise that graph based data mining has become quite popular in the 
last few years.This article introduces the theoretical basis of graph based data mining and 
surveys the state of the art of graph-based data mining. Brief descriptions of some 
representative approaches are provided as well. 

Keywords: data mining, graph, graph-based data mining, path, structured data, tree 



11 The segment support map: scalable mining of frequent itemsets 
^■k Laks V. S. Lakshmanan, Carson Kai-Sang Leung, Raymond T. Ng 
V/ December 2000 ACM SIGKDD Explorations Newsletter, volume 2 issue 2 
Publisher: ACM Press 

Full text available: *g pdf(712.50 KB) Additional Information: full citation , citings , index terms 
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12 Research track paper: On mining cross-graph quasi-cliques 
Jian Pei, Daxin Jiang, Aidong Zhang 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: ^ pdf(573.85 KB) Additional Information: full citation , abstract , references , index terms 

Joint mining of multiple data sets can often discover interesting, novel, and reliable 
patterns which cannot be obtained solely from any single source. For example, in cross- 
market customer segmentation, a group of customers who behave similarly in multiple 
markets should be considered as a more coherent and more reliable cluster than clusters 
found in a single market. As another example, in bioinformatics, by joint mining of gene 
expression data and protein interaction data, we can find cluster ... 

Keywords: bioinformatics, graph mining, patterns 




13 Research track posters: Generalizing the notion of support 
>*fcv Michael Steinbach, Pang-Ning Tan, Hui Xiong, Vipin Kumar 

V August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available: ^ pdf(1 94.35 KB) Additional Information: full citation , abstract , references , index terms 

The goal of this paper is to show that generalizing the notion of support can be useful in 
extending association analysis to non-traditional types of patterns and non-binary data. 
To that end, we describe a framework for generalizing support that is based on the 
simple, but useful observation that support can be viewed as the composition of two 
functions: a function that evaluates the strength or presence of a pattern in each object 
(transaction) and a function that summarizes these evaluations ... 

Keywords: association analysis, hyperclique, support 



14 A perspective on inductive databases 
A Luc De Raedt 

V December 2002 ACM SIGKDD Explorations Newsletter, Volume 4 issue 2 
Publisher: ACM Press 

Full text available: ^g) pdf(1 79.64 KB) Additional Information: full citation , abstract , references , citings 

Inductive databases tightly integrate databases with data mining. The key ideas are that 
data and patterns (or models) are handled in the same way and that an inductive query 
language allows the user to query and manipulate the patterns (or models) of 
interest.This paper proposes a simple and abstract model for inductive databases. We 
describe the basic formalism, a simple but fairly powerful inductive query language, some 
basics of reasoning for query optimization, and discuss some memory organ ... 

Keywords: constraint-based mining, inductive databases, inductive querying 
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August 1999 Tutorial notes of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: ^?] pdf(4.11 MB) Additional Information: full citation , references , citings, index terms 



16 Frequent patterns I: Selecting the right interestingness measure for association 
patterns 

Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: f£|pdf(937.37 KB) Additional Information: full citation, abstract, references, citings, index 
^ : " terms , review 

Many techniques for association rule mining and feature selection require a suitable 
metric to capture the dependencies among variables in a data set. For example, metrics 
such as support, confidence, lift, correlation, and collective strength are often used to 
determine the interestingness of association patterns. However, many such measures 
provide conflicting information about the interestingness of a pattern, and the best metric 
to use for a given application domain is rarely known. In this ... 

Keywords: associations, contingency tables, interestingness measure 




17 Logical foundations of object-oriented and frame-based languages 
^ Michael Kifer, Georg Lausen, James Wu 
V July 1995 Journal of the ACM (JACM), Volume 42 Issue 4 
Publisher: ACM Press 

Full text available* fj£l pdf(7.52 MB) Additional Information: full citation , abstract , references , citings , index 
" terms , review 

We propose a novel formalism, called Frame Logic (abbr., F-logic), that accounts in a 
clean and declarative fashion for most of the structural aspects of object-oriented and 
frame-based languages. These features include object identity, complex objects, 
inheritance, polymorphic types, query methods, encapsulation, and others. In a sense, F- 
logic stands in the same relationship to the object-oriented paradigm as classical 
predicate calculus stands to relational programming. ... 

Keywords: deductive databases, frame-based languages, logic programming, 
nonmonotonic inheritance, object-oriented programming, proof theory, semantics, typing 



18 Exploratory mining via constrained frequent set queries 
^ Raymond Ng, Laks V. S. Lakshmanan, Jiawei Han, Teresa Mah 

V June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 
conference on Management of data SIGMOD '99, volume 28 issue 2 
Publisher: ACM Press 

Full text available* H3_pdf(353 58 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

Although there have been many studies on data mining, to date there have been few 
research prototypes or commercial systems supporting comprehensive query-driven 
mining, which encourages interactive exploration of the data. Our thesis is that constraint 
constructs and the optimization they induce play a pivotal role in mining queries, thus 
substantially enhancing the usefulness and performance of the mining system. This is 
based on the analogy of declarative query languages like SQL and que ... 
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July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(1.12 MB) Additional Information: full citation , abstract , references , citings 

Biological databases contain a wide variety of data types, often with rich relational 
structure. Consequently multi-relational data mining techniques frequently are applied to 
biological data. This paper presents several applications of multi-relational data mining to 
biological data, taking care to cover a broad range of multi-relational data mining 
techniques. 

20 Computing curricula 2001 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Publisher: ACM Press 

Full text available: fi Qpdf(613.63 KB) 

Pr, TZ ~ Additional Information: full citation , references , citings , index terms 
KM html(2.78 KB) 



Results 1 - 20 of 200 Result page: 1 23456Z8 910 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc. 
Terms of Usage Privacy Policy Code of Ethics Contact Us 

Useful downloads: t!3 Adobe Acrobat Cfc QuickTime E H Windows Media Player ^ > Real Player 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=65752986&CF 2/23/06 



Results (page 1): anti monotonicity property 



Page 1 of 6 



i> PORTAL 



USPTO 



Subscribe (Full Service) Register (Limited Service, Free) Login 

Search: ® The ACM Digital Library O The Guide 
| anti monotonicity property "~ 



^RCH 



Terms used anti monotonicity property 



Sort results 
by 

Display 
results 



relevance 

|H ^ Save results to a Binder 



i , ^ Search Tips 

|expandedform @| n Open results in a new 
window 



Feedback Report a problem Satisfaction 
survey 

Found 4,158 of 171,143 

Try an Advanced Search 

Try this search in The ACM Guide 



Results 1 - 20 of 200 
Best 200 shown 



Result page: 12345678910 next 

Relevance scale □SHI 



Exploratory mining and pruning optimizations of constrained associations rules 
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 

conference on Management of data SIGMOD '98, volume 27 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
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From the standpoint of supporting human-centered discovery of knowledge, the present- 
day model of mining association rules suffers from the following serious shortcomings: (i) 
lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of 
relationships. In effect, this model functions as a black-box, admitting little user 
interaction in between. We propose, in this paper, an architecture that opens up the 
black-box, and supports constraint-based, human-centered explorat ... 

Pattern discovery and forecasting: Mining sequential patterns with constraints in lar g e | 
databases 

Jian Pei, Jiawei Han, Wei Wang 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 

Constraints are essential for many sequential pattern mining applications. However, there 
is no systematic study on constraint-based sequential pattern mining. In this paper, we 
investigate this issue and point out that the framework developed for constrained 
frequent-pattern mining does not fit our missions well. An extended framework is 
developed based on a sequential pattern growth methodology. Our study shows that 
constraints can be effectively and efficiently pushed deep into ... 
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Currently, there is tremendous interest in providing ad-hoc mining capabilities in database 
management systems. As a first step towards this goal, in [15] we proposed an 
architecture for supporting constraint-based, human-centered, exploratory mining of 
various kinds of rules including associations, introduced the notion of constrained frequent 
set queries (CFQs), and developed effective pruning optimizations for CFQs with 1- 
variable (1-var) constraints. While 1-var constraints a ... 

Routing: Metarouting | 
Timothy G. Griffin, joao Luis Sobrinho 

August 2005 Proceedings of the 2005 conference on Applications, technologies, 

architectures, and protocols for computer communications SIGCOMM '05 

Publisher: ACM Press 

Full text available: ^ pdf(258.67 KB) Additional Information: full citation , abstract , references , index terms 

There is a shortage of routing protocols that meet the needs of network engineers. This 
has led to BGP being pressed into service as an IGP, despite its lack of convergence 
guarantees. The development, standardization, and deployment of routing protocols, or 
even minor changes to existing protocols, are very difficult tasks. We; present an approach 
called Metarouting that defines routing protocols using a high-level and declarative 
language. Once an interpreter for a metarouting language is imple ... 

Keywords: algebraic routing, path algebras, routing protocols 



Logical foundations of object-oriented and frame-bas ed languages 

Michael Kifer, Georg Lausen, James Wu 

July 1995 Journal of the ACM (JACM), Volume 42 issue 4 

Publisher: ACM Press 

Full text available: fg)pdf(7.52 MB) Additional Information: full citation , abstract, references , citings, index 
^ terms , review 

We propose a novel formalism, called Frame Logic (abbr., F-logic), that accounts in a 
clean and declarative fashion for most of the structural aspects of object-oriented and 
frame-based languages. These features include object identity, complex objects, 
inheritance, polymorphic types, query methods, encapsulation, and others. In a sense, F- 
logic stands in the same relationship to the object-oriented paradigm as classical 
predicate calculus stands to relational programming. ... 

Keywords: deductive databases, frame-based languages, logic programming, 
nonmonotonic inheritance, object-oriented programming, proof theory, semantics, typing 
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8 A perspective on inductive databases 
Luc De Raedt 

December 2002 ACM SIGKDD Explorations Newsletter, Volume 4 issue 2 
Publisher: ACM Press 

Full text available: Qpdfd 79.64 KB) Additional Information: full citation , abstract , references , citings 

Inductive databases tightly integrate databases with data mining. The key ideas are that 
data and patterns (or models) are handled in the same way and that an inductive query 
language allows the user to query and manipulate the patterns (or models) of 
interest.This paper proposes a simple and abstract model for inductive databases. We 
describe the basic formalism, a simple but fairly powerful inductive query language, some 
basics of reasoning for query optimization, and discuss some memory organ ... 

Keywords: constraint-based mining, inductive databases, inductive querying 



9 Frequent patterns I: Selecting the right interestingness measure for association 
<g> patterns 

^ Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available: ^ pdf(937.37 KB) Addjtional Information: full citation, abstract , references , citings, index 

terms , review 

Many techniques for association rule mining and feature selection require a suitable 
metric to capture the dependencies among variables in a data set. For example, metrics 
such as support, confidence, lift, correlation, and collective strength are often used to 
determine the interestingness of association patterns. However, many such measures 
provide conflicting information about the interestingness of a pattern, and the best metric 
to use for a given application domain is rarely known. In this ... 

Keywords: associations, contingency tables, interestingness measure 



10 Tri-plots: scalable tools for multidimensional data mining 

Agma Traina, Caetano Traina, Spiros Papadimitriou, Christos Faloutsos 
August 2001 Proceedings of the seventh ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 
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We focus on the problem of finding patterns across two large, multidimensional datasets. 
For example, given feature vectors of healthy and of non-healthy patients, we want to 
answer the following questions: Are the two clouds of points separable? What is the 
smallest/largest pair-wise distance across the two datasets? Which of the two clouds does 
a new point (feature vector) come from?We propose a new tool, the tri-plot, and its 
generalization, the pq-plot, which help us answer the ... 
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(TOPLAS), Volume 15 Issue 4 
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Full text available- 1^3 pdf(3 29 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

We investigate the interactions of subtyping and recursive types, in a simply typed &lgr;- 
calculus. The two fundamental questions here are whether two (recursive)types are in the 
subtype relation and whether a term has a type. To address the first question, we relate 
various definitions of type equivalence and subtyping that are induced by a model, an 
ordering on infinite trees, an algorithm, and a set of type rules. We show soundness and 
completeness among the rules, the algorithm, and the ... 

Keywords: coercions, lambda-calculus, partial-equivalence relations, recursive types, 
regular trees, subtyping, tree orderings, type equivalence, typechecking algorithm 



12 Searching for dependencies at multiple abstraction levels 
Toon Calders, Raymond T. Ng, Jef Wijsen 
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Publisher: ACM Press 

Full text available- f Bpdf(411.24 KB) A^ 0031 Information: full citation , abstract , references , citings, index 
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The notion of roll-up dependency (RUD) extends functional dependencies with 
generalization hierarchies. RUDs can be applied in OLAP and database design. The 
problem of discovering RUDs in large databases is at the center of this paper. An 
algorithm is provided that relies on a number of theoretical results. The algorithm has 
been implemented; results on two real -life datasets are given. The extension of functional 
dependency (FD) with roll-ups turns out to capture meaningful rules that are outsi ... 

Keywords: Data mining, functional dependencies, knowledge discovery, online analytical 
processing 
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V November 1998 Journal of the ACM (JACM), Volume 45 issue 6 
Publisher: ACM Press 

Full text available: 1? | pdf(279.40 KB) Additional Information: full citation , abstract, references , citings, index 

: terms 

We propose inference systems for binary relations that satisfy composition laws such as 
transitivity. Our inference mechanisms are based on standard techniques from term 
rewriting and represent a refinement of chaining methods as they are used in the context 
of resolution-type theorem proving. We establish the refutational completeness of these 
calculi and prove that our methods are compatible with the usual simplification techniques 
employed in refutational theorem provers, such as subsump ... 

Keywords: chaining calculi, equational logic, reduction orderings, rewrite systems, term 
rewriting, transitive relations 
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The emergence of the Web has increased interests in XML data. XML query languages 
such as XQuery and XPath use label paths to traverse the irregularly structured data. 
Without a structural summary and efficient indexes, query processing can be quite 
inefficient due to an exhaustive traversal on XML data. To overcome the inefficiency, 
several path indexes have been proposed in the research community. Traditional indexes 
generally record all label paths from the root element in XML data. Such path ... 

15 Research track posters: Generalizing the notion of support 
Michael Steinbach, Pang-Ning Tan, Hui Xiong, Vipin Kumar 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available: ^pdf(1 94.35 KB) Additional Information: full citation , abstract , references , index terms 

The goal of this paper is to show that generalizing the notion of support can be useful in 
extending association analysis to non-traditional types of patterns and non-binary data. 
To that end, we describe a framework for generalizing support that is based on the 
simple, but useful observation that support can be viewed as the composition of two 
functions: a function that evaluates the strength or presence of a pattern in each object 
(transaction) and a function that summarizes these evaluations ... 
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Although there have been many studies on data mining, to date there have been few 
research prototypes or commercial systems supporting comprehensive query-driven 
mining, which encourages interactive exploration of the data. Our thesis is that constraint 
constructs and the optimization they induce play a pivotal role in mining queries, thus 
substantially enhancing the usefulness and performance of the mining system. This is 
based on the analogy of declarative query languages like SQL and que ... 
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Scientific models typically depend on parameters. Preserving the parameter dependence 
of models in the pattern mining context opens up several applications. Within association 
rule mining (ARM), the choice of parameters can be studied with more flexibly then in 
traditional model building. Studying support, confidence, and other rule metrics as a 
function of model parameters allows conclusions on assumptions underlying the models. 
We present efficient techniques to handle multiple model output da ... 
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Efficiency and Scalability have always been important concerns in the field of data mining, 
and are even more so in the multi-relational context, which is inherently more complex. 
The issue has been receiving an increasing amount of attention during the last few years, 
and quite a number of theoretical results, algorithms and implementations have been 
presented that explicitly aim at improving the efficiency and Scalability of multi-relational 
data mining approaches. With this article we attempt ... 
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